%\documentclass[10pt,conference,letterpaper]{IEEEtran}
\documentclass{sig-alternate}
\usepackage{times}
%\usepackage[english]{algorithm2e}
\usepackage{algorithm}
\usepackage{algpseudocode}
%\usepackage[named]{algo}
%\algref{<algorithm>}{<line>}
\newtheorem{theorem}{Theorem}
%\newcounter{Observation}
\newtheorem{definition}[theorem]{Definition}
\newtheorem{Observation}[theorem]{Observation}
\def\candidate{{\cal C}}
\def\comment#1{}
\usepackage{graphicx}
\input{psfig}


\pagestyle{empty}

\usepackage{graphicx}

\special{pdf: pagesize width 8.5in height 11in}

\begin{document}

\conferenceinfo{SIGMOD'11,} {June 12--16, 2011, Athens, Greece.}
\CopyrightYear{2011} \crdata{978-1-4503-0661-4/11/06}
\clubpenalty=10000 \widowpenalty = 10000

% ****************** TITLE ****************************************

\title{Finding Semantics in Time Series}

\numberofauthors{3}

%\author{%
%% author names are typeset in 11pt, which is the default size in the author block
%{Peng Wang{\small $~^{1,2}$}\titlenote{The work was done when the author visited Microsoft Research Asia}\hspace{1cm}Haixun Wang{\small $~^{2}$}\hspace{1cm}Wei Wang{\small $~^{1}$} }%
%% add some space between author names and affils
%\vspace{1.6mm}\\
%\fontsize{10}{10}\selectfont\itshape
%$^{1}$Fudan University, China\\
%%Address Including Country Name\\
%% \fontsize{9}{9}\selectfont\ttfamily\upshape
%% \{$~^{1}$pengwang5,$~^{3}$weiang1\}@fudan.edu.cn\\
%%$~^{3}$third.author@first-third.edu%
%% add some space between email and affil
%%\vspace{1.2mm}\\
%\fontsize{10}{10}\selectfont\rmfamily\itshape
%$^{2}$Microsoft Research Asia\\
%% %Address Including Country Name\\
%% \fontsize{9}{9}\selectfont\ttfamily\upshape
%% $~^{2}$haixunw@microsoft.com
%}

\author{
\alignauthor
Peng Wang\titlenote{The work was done at Microsoft Research Asia}\\
       \affaddr{Fudan University}\\
       \email{pengwang5@fudan.edu.cn}
\alignauthor
Haixun Wang\\
       \affaddr{Microsoft Research Asia}\\
       \email{haixunw@microsoft.com}
\alignauthor
Wei Wang\\
       \affaddr{Fudan University}\\
       \email{weiwang1@fudan.edu.cn}
}



\maketitle

\begin{abstract}
  In order to understand a complex system, we analyze its output or
  its log data. For example, we track a system's resource consumption
  (CPU, memory, message queues of different types, etc) to help avert
  system failures; we examine economic indicators to assess the
  severity of a recession; we monitor a patient's heart rate or EEG
  for disease diagnosis. Time series data is involved in many such
  applications. Much work has been devoted to pattern discovery from
  time series data, but not much has attempted to use the time series
  to unveil a system's internal dynamics.  In this paper, we go beyond
  learning patterns from time series data. We focus on obtaining a
  better understanding of its data generating mechanism, and we regard
  patterns and their temporal relations as organic components of the
  hidden mechanism. Specifically, we propose to model time series data
  using a novel pattern-based hidden Markov model (pHMM), which aims
  at revealing a global picture of the system that generates the time
  series data. We propose an iterative approach to refine pHMMs
  learned from the data. In each iteration, we use the current pHMM to
  guide time series segmentation and clustering, which enables us to
  learn a more accurate pHMM.  Furthermore, we propose three pruning
  strategies to speed up the refinement process. Empirical results on
  real datasets demonstrate the feasibility and effectiveness of the
  proposed approach.
\end{abstract}

\renewcommand{\baselinestretch}{.97} \normalsize
\bibliographystyle{abbrv}


\category{H.2.8}{Database Management}{Database Applications}[Data
Mining]


\terms{Algorithms, Performance}

\keywords{Time Series, Hidden Markov Model, Pattern}

\input{introduction2}

\input{preliminary}
\input{initial}
\input{refine}

\input{experiment_sigmod11}
\input{related}

\section{Conclusion}
\label{sec:conclusion}


In this paper, we reveal the dynamics of a complex system by
learning a pattern-based hidden Markov model from the time series
data generated by this system. The biggest difference between a pHMM
and a traditional HMM is that in pHMM, observations are not given,
but learned from the data as well. We propose an approach to learn
patterns (observations) and the model simultaneously. Furthermore,
three pruning strategies are proposed to speed up the learning
process. With pHMM, we are able to perform pattern based tasks, such
as trend prediction and pattern-based correlation detection.
Empirical results on real datasets demonstrate the feasibility and
effectiveness of the
proposed approach. % In our future work, we plan to extend it to
% multiple dimensional datasets, and stream applications.

{\renewcommand{\baselinestretch}{.95} \normalsize
\bibliographystyle{abbrv}
\bibliography{haixun}

}


\end{document}
